714 research outputs found

    The Effects of Fundamental Frequency Contours on the Intelligibility Benefit of Clear Speech in Native Speakers of American English and Native Speakers of Seoul Korean

    Get PDF
    University of Minnesota Ph.D. dissertation. August 2019. Major: Speech-Language-Hearing Sciences. Advisor: Robert Schlauch. 1 computer file (PDF); ix, 105 pages.Clear speech, which is characterized by specific acoustic changes that are distinguished from ordinary conversational speech, is a speaking strategy that enhances a talkerā€™s intelligibility in adverse listening conditions. An increase in the range of a talkerā€™s fundamental frequency (F0) is known as one of the several acoustic changes that is observed when clear speech is produced. Although an increase in F0 range is often seen in clear speech, its contribution to the clear speech benefit is unknown. Experiment 1 in this dissertation examined whether an increase in F0 variation contributes significantly to the clear speech benefit in native speakers of American English. Experiment 2 evaluated clear speech effects in native speakers of Seoul Korean who started to learn English after the age of six and also examined the role of F0 variation. The clear speech benefit was measured by having talkers produce sentences in a conversational and a clear speaking style. The stimuli for these experiments were produced by several talkers and were recorded digitally. At the time of recording, participants were instructed to read aloud low-context sentences in conversational (Experiments 1 & 2), clear (Experiments 1 & 2), and exaggerated-F0 (only Experiment 2) speaking styles. The exaggerated speaking style, which is similar to infant-directed speech with a wide range of F0, was a condition given to the native Korean talkers because Koreans do not typically vary their F0 much in various speaking styles. To characterize acoustic-phonetic changes at the sentence level in talkersā€™ productions, five acoustic changes were measured: speech rates, long-term spectra, F0 distribution, vowel formant frequencies, and vocal intensity levels. Sentences from the talkers were presented to native listeners of American English in a perceptual study. F0-manipulated speech was synthesized from the clear speech (Experiment 1) and from the exaggerated-F0 speech (Experiment 2) to examine whether F0 variation is a contributing factor in the intelligibility benefits in native English speakers and in native Korean speakers. This was accomplished by compressing the F0 contours of clear speech to match those of conversational speech in Experiment 1 and by compressing the F0 contours of exaggerated-F0 speech to match those of conversational speech in Experiment 2. Listeners were randomly presented with sentences in different speaking styles and asked to type in the sentence after orally repeating each sentence that they heard. The percentage of correct keywords was calculated for each speaking style. The data revealed that F0 range did not contribute to the clear speech benefit. The exaggerated-F0 speech condition for the Korean talkers showed slightly poorer intelligibility benefit than the clear speech condition. A follow-up study of speech naturalness revealed that clear speech is more natural than exaggerated-F0 speech. However, a significant correlation between intelligibility and speech naturalness was not found. Although the experiments were designed to examine directly the role of F0 range on the clear speech benefit, the recordings and perceptual data provided opportunities to study other perceptual correlates of this phenomenon. The primary acoustic factor contributing to the clear speech benefit for native English and native Korean talkers was an increase in the intensity of high-frequency speech sounds

    Improving Eye Motion Sequence Recognition Using Electrooculography Based on Context-Dependent HMM

    Get PDF
    Eye motion-based human-machine interfaces are used to provide a means of communication for those who can move nothing but their eyes because of injury or disease. To detect eye motions, electrooculography (EOG) is used. For efficient communication, the input speed is critical. However, it is difficult for conventional EOG recognition methods to accurately recognize fast, sequentially input eye motions because adjacent eye motions influence each other. In this paper, we propose a context-dependent hidden Markov model- (HMM-) based EOG modeling approach that uses separate models for identical eye motions with different contexts. Because the influence of adjacent eye motions is explicitly modeled, higher recognition accuracy is achieved. Additionally, we propose a method of user adaptation based on a user-independent EOG model to investigate the trade-off between recognition accuracy and the amount of user-dependent data required for HMM training. Experimental results show that when the proposed context-dependent HMMs are used, the character error rate (CER) is significantly reduced compared with the conventional baseline under user-dependent conditions, from 36.0 to 1.3%. Although the CER increases again to 17.3% when the context-dependent but user-independent HMMs are used, it can be reduced to 7.3% by applying the proposed user adaptation method

    Automatic Speech Recognition and Its Application to Information Extraction

    No full text
    This paper describes recent progress and the author's perspectives of speech recognition technology. Applications of speech recognition technology can be classified into two main areas, dictation and human-computer dialogue systems. In the dictation domain, the automatic broadcast news transcription is now actively investigated, especially under the DARPA project. The broadcast news dictation technology has recently been integrated with information extraction and retrieval technology and many application systems, such as automatic voice document indexing and retrieval systems, are under development. In the human-computer interaction domain, a variety of experimental systems for information retrieval through spoken dialogue are being investigated. In spite of the remarkable recent progress, we are still behind our ultimate goal of understanding free conversational speech uttered by any speaker under any environment. This paper also describes the most important research issues that we should attack in order to advance to our ultimate goal of fluent speech recognition

    Speech and Speaker Recognition Evaluation

    No full text
    Abstract This chapter overviews techniques for evaluating speech and speaker recognition systems. The chapter first describes principles of recognition methods, and specifies types of systems as well as their applications. The evaluation methods can be classified into subjective and objective methods, among which the chapter focuses on the latter methods. In order to compare/normalize performances of different speech recognition systems, test set perplexity is introduced as a measure of the difficulty of each task. Objective evaluation methods of spoken dialogue and transcription systems are respectively described. Speaker recognition can be classified into speaker identification and verification, and most of the application systems fall into the speaker verification category. Since variation of speech features over time is a serious problem in speaker recognition, normalization and adaptation techniques are also described. Speaker verification performance is typically measured by equal error rate, detection error trade-off (DET) curves, and a weighted cost value. The chapter concludes by summarizing various issues for future research

    Inference Network-based Indonesian Spoken Query Information Retrieval

    Get PDF
    Query term misrecognition caused by the speech recognizer is one of the important issues in the spoken query information retrieval. The misrecognized term in the transcribed query leads to the retrieval of irrelevant documents. To raise the correct ranking of the retrieved documents, we use a speech recognition confidence score based on word posterior probability to weight the term in the inference network-based (IN-based) Indonesian information retrieval system. Our result shows that this technique can improve the mean reciprocal rank (MRR) score of the retrieved documents.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Automatic Speech Recognition (6 October 2009)

    Digital speech processing synthesis and recognition.

    No full text

    Overview of the 21st Century COE Program "Framework for Systematization And Application Of . . .

    No full text
    This paper describes a new five-year COE program "Framework for Systematization and Application of Large-scale Knowledge Resources" that has recently been launched at Tokyo Institute of Technology. This project will conduct a wide range of interdisciplinary research combining humanities and technology to build the framework for systematization and application of large-scale knowledge resources in electronic forms. Spontaneous speech, written language, materials for e-learning and multimedia teaching, classical literature, historical documents, and information on cultural properties will be targeted as examples of actual knowledge resources. They will be systematized based on a structure of their meanings. Pioneering new academic disciplines and educating knowledge resource researchers are also targets of the project. Large-scale systems for computation and information storage as well as retrieval will be installed for conducting the research and education

    Digital Speech Processing, Synthesis, and Recognition 2/E

    No full text
    A study of digital speech processing, synthesis and recognition. This second edition contains new sections on the international standardization of robust and flexible speech coding techniques, waveform unit concatenation-based speech synthesis, large vocabulary continuous-speech recognition based on statistical pattern recognition, and more
    • ā€¦
    corecore